E ects of Memory Performance on Parallel
نویسندگان
چکیده
We develop a new metric for job scheduling that includes the eeects of memory contention amongst simultaneously-executing jobs that share a given level of memory. Rather than assuming each job or process has a xed, static memory requirement, we consider a general scenario wherein a process' performance monotonically increases as a function of allocated memory, as deened by a miss-rate versus memory size curve. Given a schedule of jobs in a shared-memory multiprocessor (SMP), and an isolated miss-rate versus memory size curve for each job, we use an analytical memory model to estimate the overall memory miss-rate for the schedule. This, in turn, can be used to estimate overall performance. We develop a heuristic algorithm to nd a good schedule of jobs on a SMP that minimizes memory contention, thereby improving memory and overall performance.
منابع مشابه
A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کاملA High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کاملPerformance Prediction and Evaluation of Parallel Processing on a NUMA Multiprocessor
Non-Uniform Memory Access (NUMA) architectures make it possible to build large-scale shared memory multiprocessor systems in comparison with non-scalable UniformMemory Access (UMA) architectures. Most NUMA multiprocessor operations such as scheduling and synchronizing processes, accessing data from processors to memory models and allocating distributed memory space to di erent processors, are p...
متن کاملEvaluation of Hardware Write Propagation Support for Next - Generation SharedVirtual Memory
Virtual Memory Clusters Angelos Bilas1, Liviu Iftode2, and Jaswinder Pal Singh1 1 Department of Computer Science, Princeton University Princeton, NJ 08544 2 Department of Computer Science, Rutgers University Piscataway, NJ 08855 fbilas, [email protected], [email protected] Abstract Clusters of symmetric multiprocessors (SMPs), connected by commodity system-area networks (SANs) and inter...
متن کاملPerformance Modeling with Pamela: An Introduction
In this report we present a new methodology for the performance prediction of parallel programs on parallel platforms ranging from shared-memory to distributed-memory (vector) machines. The complete methodology comprises the concurrent language Pamela (PerformAnce ModEling LAnguage), the program and machine modeling paradigm, and a novel performance analysis method, called "serialization analys...
متن کاملEvaluating The Performance of Non-Blocking Synchronisation on Modern Shared-Memory Multiprocessors
Parallel programs running on shared memory multiprocessors coordinate via shared data objects/structures. To ensure the consistency of the shared data structures, programs typically rely on some forms of software synchronisations. Unfortunately typical software synchronisation mechanisms usually result in poor performance because they produce large amounts of memory and interconnection network ...
متن کامل